Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism
نویسندگان
چکیده
Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M × N) to O(mN + Mn − mn), where M and N are the lengths of the original strings and m and n the number of runs in their RLE representation, and to improve the inherent parallelism of the proposed algorithm, so that it may execute in O(m+ n) steps on a systolic array of M +N units. We also discuss the application of the proposed algorithm to the related problem of edit distance (ED) computation. 2004 Elsevier B.V. All rights reserved.
منابع مشابه
Matching for Run-Length Encoded Strings
1 Motivation Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of nding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y i...
متن کاملA fast and simple algorithm for computing the longest common subsequence of run-length encoded strings
a r t i c l e i n f o a b s t r a c t Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O (kl + min{p 1 , p 2 }) time, where p 1 and p 2 denote the numbers of elements in the botto...
متن کاملFinding a longest common subsequence between a run-length-encoded string and an uncompressed string
In this paper, we propose anO(min{mN,Mn}) time algorithm for finding a longest common subsequence of stringsX and Y with lengthsM andN , respectively, and run-length-encoded lengthsm and n, respectively. We propose a new recursive formula for finding a longest common subsequence of Y and X which is in the run-length-encoded format. That is, Y=y1y2 · · · yN andX=r1 1 r2 2 · · · rm m , where ri i...
متن کاملFast Algorithms for Computing the Constrained LCS of Run-Length Encoded Strings
In the constrained longest common subsequence (CLCS) problem, we are given two sequences X , Y and the constrained sequence P in run-length encoded (RLE) format, where |X| = n, |Y | = m and |P | = r and the numbers of runs in RLE format are N , M and R, respectively. In this paper, we show that after the sequences are encoded, the CLCS problem can be solved in O(NMr+ r × min{q1, q2} + q3) time,...
متن کاملA hardness result and new algorithm for the longest common palindromic subsequence problem
The 2-LCPS problem, first introduced by Chowdhury et al. [Fundam. Inform., 129(4):329–340, 2014], asks one to compute (the length of) a longest palindromic common subsequence between two given strings A and B. We show that the 2-LCPS problem is at least as hard as the well-studied longest common subsequence problem for 4 strings. Then, we present a new algorithm which solves the 2-LCPS problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 90 شماره
صفحات -
تاریخ انتشار 2004